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■ Abstract. Concurrent garbage collectors are notoriously difficult to implement correctly. Previous 

Ch ' approaches to the issue of producing correct collectors have mainly been based on posit-and-prove 

, ^ , verification or on the application of domain-specific templates and transformations. We show how 

to derive the upper reaches of a family of concurrent garbage collectors by refinement from a formal 
' specification, emphasizing the application of domain-independent design theories and transforma- 

tions. A key contribution is an extension to the classical lattice-theoretic fixpoint theorems to 
account for the dynamics of concurrent mutation and collection. 

q 
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1 Introduction 

Concurrent collectors are extremely complex and error-prone. Since such collectors now form 
part of of the trusted computing base of a large portion of the world's mission- critical software 
. infrastructure, such unreliability is unacceptable [31]. Therefore it is a worthwhile if not manda- 

I tory endeavor to provide means by which the quality of such software can be improved - without 

CO ■ doing harm to the productivity of the programmers. 

I The latter aspect still is a major obstacle in verification-oriented systems. Interactive theorem 

' provers may need thousands of lines of proof scripts or hundreds of lemmas in order to cope 

■ with serious collectors (see e.g. [20,25,9]). But also fully automated verifiers exhibit problems. 

As can be seen e.g. in [13] even the verification of a simplified collector necessitates such a large 
amount of complex properties that the specification may easily become faulty itself. 

^ . These considerations show a first mandatory prerequisite for the development of correct software 

a I of realistic size and complexity: not only the software but also its correctness proof need to be 

modularized. However, such a modularization is not enough. Even when it has been successfully 
verified that all requested properties are fulfilled by the software, it remains open, whether 
these properties taken together do indeed specify the intended behavior. This is an external 
judgment that lies outside of any verification system. Evidently, such judgments are easier and 
more trustworthy, when the properties are few, simple and easy to grasp. 

Finally there is a third aspect which needs to be addressed by a development methodology. 
Garbage collectors - like most software products - come in a plethora of possible variations, 
each addressing specific quality or efficiency goals. When each of these variations is verified 
separately, a tremendous duplication of work is generated. On the other hand it is extremely 
difficult to analyze for a slightly modified algorithm, which properties and proofs can remain 
unchanged, which are superfluous and which need to be added or redone. 

We propose here a development method, which addresses the aforementioned issues and which 
has already been successfully applied to complex problems, for example real-world-size planning 
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and scheduling tasks [5, 27]. The method bases on the concept of specification refinement. Two 
major aspects of this concept are ihustrated in Figures 1 and 2. 



1.1 Sequential vs. Concurrent Garbage Collection 

The very first garbage collectors, which essentially go back to McCarthy's original design [19], 
were stop-the-world collectors. That is, the Mutator was completely laid to sleep, while the Col- 
lector did its recycling. This approach leads to potentially very long pauses, which are nowadays 
considered to be unacceptable. 

The idea of having the Collector run concurrently with the Mutator goes back to the seminal 
papers of Dijkstra et al. [8] and Steele [28] (which were followed by many other papers trying to 
improve the algorithm or its verification). The Doligez-Leroy-Gonthier algorithm (short: DLG) 
that was developed for the Concurrent CAML Light system [9, 10], is considered an important 
milestone, since it not only takes many practical complications of real-world collectors into 
account, but also generalizes from a single Mutator to many Mutators. 

The transition to concurrent garbage collection necessitates a trade-off between the precision of 
the Collector and the degree of concurrency it provides [31]: the higher the degree of concurrency, 
the more garbage nodes will be overlooked. However, this is no major concern in practice, since 
the escaped garbage nodes will be found in the next collection cycle. 



1.2 Abstract and Concrete Problems 

Figure 1 describes the way in which we come from abstract problems to concrete solutions. 
(1) Suppose we have an abstract problem description, that is, a collection of types, operations 
and properties that together describe a certain problem. (2) For this abstract problem we then 
develop an abstract solution, that is, an abstract implementation that fulfills all the requested 
properties. (3) When we now have a concrete problem that is an instance of our abstract problem 
(since it meets all its properties), then we can (4) automatically derive a concrete solution by 
instantiating the abstract solution correspondingly. Ideally the abstract problem/solution pairs 
can even be found in a library like the one of the Specware system [16]. 




^refine 



Abstract 
solution 



lattices / epos 



'^refill 



Concrete 
solution 



graphs & sets 



Fig. 1. Abstract and concrete problems and their solutions 



For example, in the subsequent sections of this paper we will consider the abstract problem 
of finding fixed points in lattices or epos and several solutions for this problem. Then we will 
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show that garbage collection is an instance of this abstract problem by considering the concrete 
graphs and sets as instances of the more abstract lattices. This way our abstract solutions carry 
over to concrete solutions for the garbage collection problem. 

Technically all our problem and solution descriptions are algebraic and coalgebraic specifications 
(as will be defined more precisely later), which arc usually undcrspccified and thus possess many 
models. "Solutions" are treated as borderline cases of such specifications, which are directly 
translatable into code of some given programming language. (This concept has nowadays been 
popularized as "automatic code generation from models".) The formal connections between the 
various specifications are given by certain kinds of refinement morphisms, and the derivation 
of the concrete solution from the other parts is formally a pushout construction from category 
theory^. 

Figure 1 also illustrates another aspect of our methodological way of proceeding. When we are 
confronted with a concrete problem, we try to extract from it a more abstract problem that 
represents the core of the given task. Even though this looks like additional effort at first sight, 
it is usually a worthwhile endeavor. First of all, we obtain the desired modularization of the 
derivation and verification. Secondly, the concentration on the kernel of the problem usually 
simplifies the finding of the (abstract) solutions. And last but not least we can often come up 
with variations on the theme that woTild have been buried under the bulk of details otherwise. 
As is pointed out in Figure 1 the introduction of the details of the concrete problem can be done 
almost automatically and thus docs not really cause additional work. 

This principle of working with an abstract view of the concrete problem can also be found 
in other approaches, for example in [20,13]. But there the principle is more implicitly used 
(in statements such as Correctness means that each of these procedures faithfully represent the 
abstract state [13]), whereas we make the abstraction/concretization into an explicitly available 
development tool, based on a rigorous notion of morphisms. 

1.3 Development by Refinement 

Figure 2 illustrates the second essential aspect of our method. We do not work with a single 
problem/solution pair and their concrete instances. Rather we construct a whole "family tree" 
(which actually may be a dag) of more and more refined problems, each giving rise to more and 
more refined solutions. On the problem side "refined" essentially means that we have additional 
properties, on the solution side "refined" essentially means that we have better algorithms, e.g. 
more efficient, more robust, more concurrent etc. 

This way of proceeding has the primary advantage that it allows us to reuse verification and 
development efforts. Suppose that at some point in the tree we want to design a new variation. 
This is reflected in a new refinement child of the current specification, to which certain properties 
are added. In the accordingly modified new solution we need only prove those properties that 
have been added; everything else is inherited. 

^ A morphism $ from specification S to specification T is given by a type-consistent mapping of the type, 
function, and predicate symbols of S to derived types, functions, and predicates in T. The mapping is a 
specification morphism if the axioms of S translate to theorems of T. A pushout construction is used to 
compose specifications. More detail on the category of specifications may be found in [26, 16, 22] 
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Fig. 2. Refinement of problems (and solutions) 
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More details about the method sketched above can be found in [26] . The remainder of this paper 
will make things more precise by presenting concrete examples. 

In an earlier paper [22] we have presented one exemplary development of a garbage collector 
from an initial non-executable specification to an executable implementation. But - as was 
critically noted in [31] - we did "not explore an algorithm space". Such an exploration is the 
main purpose of the present paper. This is a similar goal as that of Vechev et al. [31]: they 
start from a generic algorithm, which is parameterized by an underspecified function, such 
that different instantiations of this function lead to different collection algorithms. A primary 
concern of [31] is the possibility to combine various "design dimensions" in a very flexible way. 
By contrast to their approach we study the family tree of specifications and implementations 
that can be systematically derived using formal refinements. (The interchangeability of some of 
these refinements actually makes the family tree into a family dag.) So our focus is on the method 
of refinement and its potential tool support and not on garbage collection as such. Moreover, 
whereas both our earlier paper [22] and the work of Vechev et al. [31] mostly concentrate on one 
phase of garbage collection - namely the marking phase - the present paper addresses the whole 
task of garbage collection. In addition, wc do not only consider mark-and-swccp collectors but 
also copying collectors. Last but not least, we base the whole treatment of garbage collection on 
very fundamental mathematical principles, namely lattices and fixed points. 



1.4 Data Reification 

The final efficiency of most practical garbage collection algorithms depends on the use of clever 
data representations. Standard techniques range from the classical stacks or queues to bit maps, 
overlayed pointers, so-called dirty bits, color toggling and so forth. 

In our approach all these designs fall under the paradigm of data reification morphisms. This 
means that we can work throughout our developments with high-level abstract data structures 

such as graphs and sets in order to specify and verify the algorithmic aspects in the clearest 
possible way. It will be only at the end of the derivation that the high-level data structures are 
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implemented by concrete data structures, which are chosen based on their efficiency in the given 
context. This step is widely automatic in systems like Specware [16], including many low-level 
optimizations. Since this is very technical and can be done almost automatically by advanced 
systems, we will only touch this part very briefly and sketchy here. 

1.5 Summary of Results 

We present a methodology that allows us to derive a wide variety of garbage collection algorithms 
in a systematic way. This approach not only modularizes the resulting programs but also the 
derivation process itself such that the verification is split into small and easy-to-comprehend 
pieces, allowing considerable reuse of proofs. In more detail, we present the following results: 

- We start by presenting a "dynamic" generalization of the well-known fixed-point results of 
lattice theory. 

- This basis is presented as a general specification that covers a whole range of implementa- 
tions. We call this the "micro-step" approach. 

- These fixpoint-based specifications can be refined further to more and more detailed designs, 
which correspond to the major algorithms found in today's literature. 

Even though we cannot present all the algorithms in full detail, we can at least show "in princi- 
ple" , how a whole variety of important and practical algorithms come out from our refinement 
process. These include above all the (DLG) algorithm of Doligez, Leroy and Gonthier [9] - 
which sometimes is considered the culmination of concurrent collector development [1] - and its 
descendants. 

2 Notes on Garbage Collection 

Even though our approach starts from very abstract and high-level mathematical concepts - 
viz. lattices and fixed points (Section 3) - and takes several refinement steps (Section 4) before 
it ends with some special aspects of garbage collection (Section 5), it is helpful to motivate our 
main design decisions by having the concrete application of garbage collection in mind. Actually, 
we perceive three major stages of refinements: 

1. We start from a "purely mathematical problem", namely lattices (actually epos) and fixed 
points. On this level we derive the core properties that mark off the solution space. 

2. Then we proceed to "abstract garbage collection"; that is, we model the problem by graphs 
and sets. This intermediate stage can on the one hand be easily shown to be an instance of 
the lattice-based abstraction; but on the other hand it already refers to important aspects 
of the concrete garbage collection problem. Hence, all algorithmically relevant aspects can 
be dealt with on this level. 

3. The final step introduces the various specialized data structures, write barriers and the like 
that go to make a realistic garbage collector. (This final step will only be roughly sketched 
in this paper.) 
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addNew: . . . 




live = active l±l supply 
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Fig. 3. The system architecture 



2.1 Architecture and Basic Terminology 

Before we delve into the formal derivation we want to clarify the basic setting and the terminology 
that we use here. This is best done at the intermediate level of abstraction, where the garbage 
collection problem is formulated in terms of graphs and sets. 

We modularize the problem by way of three kinds of components (using a UML-inspired rep- 
resentation; see Figure 3). The Mutators represent the activities of all programs that use the 
heap. These activities base on primitive operations that are provided by the component Store, 
which represents the memory management system (as part of the runtime system or operating 
system). Finally the task of the garbage collection is performed by a component Collector. 

The Mutator operates on a graph, which is a data structure of type Graph. It can essentially 
perform three primitive operations:^ 

- addArc{a, b): add a new arc between two nodes a and b. 

- delArc{a, b): delete the arc between a and b. This may have the effect that b and other nodes 
reachable from b become unreachable ("garbage"). 

* This considerably simphfies the memory model used in the famous DLG algorithm [9, 10], where the Mutator 
has eight operations. However, the essence of these operations is captured by our three operations above. (We 
will come back to this issue in Section 5.) 
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- addNew{a): allocate a new node b (from the freelist) and attach it by an arc from a. This 
reflects the fact that in reality alloc operations return a pointer, which is stored in some field 
(variable, register, heap cell) of the Mutator. Hence, the new node is immediately linked to 
the Mutator's graph. 

The Store provides the low-level interface to the actual memory-access operations.^ But on this 
abstract level its specification also provides the basic terminology that is needed for talking 
about garbage collection. In particular we use the following sets: 

- active are those nodes that constitute the Mutator's graph. 

- supply are the nodes in the freelist. (They become active through the operation addNew.) 

- live is a shorthand for the union of the active and supply nodes. 

- dead are the garbage nodes that arc neither reachable from the Mutator nor in the freelist. 
(Nodes may become dead through the operation delArc.) 

Note that the specifications in Figure 3 use A = B ^ C as a shorthand notation for the two 
properties A = B\J C and 5 fl C = 0. They also use overloading of operation names. For ex- 
ample active is used both for the subgraph that constitutes the Mutator's view and for the set 
of nodes in this subgraph. Such overloaded symbols must always be distinguishable from their 

context. 

Note also that we frequently refer to the "set" Arcs of the arcs of a graph and also to the "set" 
sucs{a) of all successors of a node a; but these are actually multisets, since two nodes may be 
connected by several arcs. (Technically, the cell has several slots that all point to the same cell.) 

2.2 Fundamental Properties of the Mutator 

The Mutator's operations addArc, delArc, addNew have an invariant property that is decisive 
for the working of any kind of garbage collector: being garbage is a stable property [1]. 

Proposition 1 (Antitonicity of Mutator). A Mutator can never "escape" the realm given 
by its graph and the freelist; that is, it can never reclaim dead garbage nodes. In other words, 
the realm of live nodes (graph + freelist) monotonically decreases. 

2.3 The Fundamental Specification of Garbage Collection 

Surprisingly often papers on garbage collection refer to an intuitive understanding of what the 
Collector shall achieve. But in a formal treatment we cannot rely on intuition; rather we have 
to be absolutely precise about the goal that we want to achieve. 

Consider the architecture sketched in Figure 3. The Mutator continuously performs its basic 
oporations addArc, delArc and addNew, which - from the Mutators viewpoint - are all considered 



^ This rcficcts the situation of many modern systems, ranging from functional languages lik(> ML or Haskell to 
object-oriented languages like C# or Java. In languages like C or 0"*"+ the situation is more intricate. 
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to be total functions; i.e. they return a defined value on all inputs. This is trivially so for addArc 
and delArc, since their arguments are existing in the mutators graph. The problematic operation 
is addNew, since this operation needs an element from the freelist. However, the freelist may be 
empty (i.e. supply = 0). In this situation there are two possibilities: 

- I active\ = MemorySize. That is, the Mutator has used all available memory in its graph. 
Then nothing can be done! 

- I active\ < MemorySize. When supply = 0, this means that dead ^ 0. This is the situation 
in which we want to recycle dead garbage cells into the freelist. And this is the Collector's 
reason for existence! 

Based on this reasoning, we obtain two basic principles for the Mutator/CoUector paradigm. 

Assumption 1 (Boundedness of Mutator's graph) [ Mutator . graph] < MemorySize 

Under this global assumption the Collector has to ensure that the operation addNew is a total 
function (which may at most be delayed) . This can be cast into a temporal-logic formula: 

Goal 2 (Specification of Collector) □<> supply ^ (provided assumption 1 holds) 

This is a liveness property stating that "at any point in time the freelist (may be empty but) will 
eventually be nonempty." When this condition is violated, that is, supply = 0, then it follows 
by the global Assumption 1 that dead ^ 0. Hence the Collector has to find at least some dead 
nodes, which it can then transfer to the freelist. This can be cast into an operation recycle with 
the initial specification given in Figure 4. 



SPEC Collector 

recycle: Graph [Node, Arc) — >■ Set{Node) 

9 C recycle (G) C dead if dead / 

= recycle{G) if dead = 



Fig. 4. The Collector's task 

Hence we should design the system's working such that the following property holds (using an 
ad-hoc notation for transitions). 

Goal 3 (Required actions of Collector) □<> (^supply — >■ supply l±i recycle{G)) 

When Goal 3 is met, then the original Goal 2 is also guaranteed to hold. In other words, the 
collector has to periodically call recycle and add the found subset of the garbage nodes to the 
freelist. 

Note that the above operation can happen at any point in time; we need not wait until the freelist 
is indeed empty. (This observation leaves considerable freedom for optimized implementations 
which are all correct.) 
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2.4 How to Find Dead Nodes 

Unfortunately, the specification of recycle in Figure 4 is not easily implementable since the 

dead nodes are not directly recognizable. Since the dead nodes are computed by taking the 
complement of the live nodes (i.e. live = Zdead = nodesbackslashlive), the idea comes to mind 
to work with the complement of recycle. This leads to the simple calculation 

C recycle{G) C dead 
<^ C0 D Crecycle{G) D Cdead 
<^ nodes D C recycle (G) ^ live 
<^ nodes D trace{G) 2 live 

where we introduce a new function trace{G) such that recycle{G) = \!,trace{G). 
This leads to the refined version of the Collector's specification in Figure 5. 



SPEC Collector 


recycle: Graph{Node, Arc) 


-)■ Set (Node) 


trace: Graph{Node, Arc) — > 


Set{Node) 


recycle{G) — Ctrace{G) 




live C trace{G) C nodes 


if dead ^ 


trace{G) = nodes 


if dead = 



Fig. 5. The Collector's task (first refinement) 



Note that this specification, which will form the starting point for our more detailed derivation, 
is formally derived from the fundamental requirements for garbage collection as expressed in 
Assumption 1 and Goal 2 above! 

2.5 An Intuitive Example and a Subtle Bug 

We demonstrate the working of a typical garbage collection algorithm by a simple example. 
Figure 6 illustrates the situation at the beginning of the Collector by showing a little fragment 
of the store; solid nodes are reachable from the root A, dashed circles represent dead garbage 
nodes (the arcs of which are not drawn here for the sake of readability). We use the metaphor 
of "planes" to illustrate both mark-and-sweep and copying collectors. In the former, "lifting" 
a node to the upper plane means marking, in the latter it means copying. The picture already 
hints at a later generalization, where the store is partitioned into "regions". 

Figure 7 shows an intermediate snapshot of the algorithm. Some nodes and arcs are already lifted 
(i.e. marked or copied), others are still not considered. The gray nodes are in the "hot zone" - 
the so-called "worksct" which means that they are marked/copied, but not all outgoing arcs 
have been handled yet. 

Figure 8 shows the next snapshot. Now all direct successors of A have been treated. Therefore 
A is taken out of the workset - which we represent by the color black. 



Fig. 6. At the start of the Collector 




Fig. 8. The next snapshot 
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Note that we have the invariant property (which wih play a major role in the sequel) that all 
downward arrows start in the workset. This corresponds to one of the two main invariants in 
the original paper of Dijkstra et al. [8]. 

A subtle problem: Now let us assume that in this moment the Mutator intervenes by adding an 
arc A ^ E and then deleting the arc D ^ E. This leads to the situation depicted in Figure 9. 
Since A is no longer in the workset, its connection to E will not be detected. Hence, E is hidden 
from the Collector [31] and therefore will be treated as a dead garbage node - which is a severe 
bug! 




Any formal method for deriving garbage collection algorithms must ensure that this bug cannot 
happen! Note that this situation violates the invariant about the downward arrows. And our 
formal treatment will show that keeping this invariant intact is actually the clue to the derivation 

of correct garbage collectors. 

There are three reasonable ways to cope with this problem (using suitable write barriers): 

- When performing addArc{A, E), record E. (This is the approach of Dijkstra et al. [8].) 

- When performing addArc{A, E), record A. (This is the approach taken by Steele [28].) 

- When performing delArc{D, E), record E. (This is the approach taken by Yuasa [32].) 

Vechev et al. [31] speak of "installation-protected" in the first two cases and of "deletion- 
protected" in the last case. 

Note also that this bug may appear in an even subtler way during the handling of a node in 
the workset. Consider the node C in Figure 8 and suppose that it has lifted the first two of its 
three arcs. At this moment the Mutator redirects the first pointer field to, say, E. But a naive 
Collector will nevertheless take node C out of the workset (color it black) when its final arc has 
been treated. - The same bug again! 

At this point we will leave the concrete considerations about garbage collectors and pass on to 
the more abstract viewpoint of fixed points and lattices (or epos) . In the terminology of Figure 1 
we follow the upward arrow, that is, we generalize a concrete problem into a more abstract one. 
Once this is done, we can derive a whole variety of solutions in a strictly top-down fashion. 
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3 Mathematical Foundation: Fixed Points 

In garbage collection one can roughly distinguish two classes of collectors (see Section 1.1): 

- Stop-the-world collectors: these are the classical non-concurrent collectors, where the muta- 
tors need to be stopped, while the collector works. 

- Concurrent collectors: these are the collectors that allow the mutators to keep working 
concurrently with the collector (except for very short pauses). 

As we will demonstrate in a moment, the traditional stop-the-world collectors correspond on the 
abstract level to classical fixed-point theory. For the concurrent collectors we need to generalize 
this classical fixed-point theory to a variant that we baptized "dynamic fixed points" . 

We briefly review the classical theory before we present our generalization. 
3.1 Classical Fixed Points (Stop-the-world Collectors) 

The best known treatments of the classical fixpoint problem in complete lattices are those 
of Tarski [29] and Kleene [17]. Before we quote these we present some relevant terminology 
(assuming that the reader is already familiar with the very basic notions of partial order, join, 
meet etc.) 

- For a set s = {xq, xi, X2, . . .} of type Set{A) and a function f:A^Awe use the overloaded 
function /: Set{A) Set{A) by writing /(s) as a shorthand for {/(a:^)) f {xi) , f {xq) , . . .}. (In 
functional-programing notation this would be written with the apply-to- all operator as / * s.) 

- A function f: A ^ A \s monotone, if x < y ^ fi^) ^ fiv) holds. 

- The function / is continuous, if f{\J{xo, xi,X2, ■ ■ ■}) = \-i{f{x^), f{xi), f{x2), . . . } holds. This 
could be shortly written as/oU = Uo/ (by using the overloaded versions of the symbol /). 

- The function / is inflationary in x, if x < f{x) holds. Then x is called a post- fixed point of 
/. (Analogously for pre- fixed, points.) 

- The element x is called a fixed point of/, if x = f{x) holds; x is the least fixed point, ii x < y 
for any other fixed point y of f. 

- The element x is called a fixed point of / relative to r, ii x = f (x) A r < x holds. 

- By f{x) = LEAST u. u = f{u) Ax < u we denote the reflexive-transitive closure of / (when 
it exists); i.e. the function that yields the least fixed point of / relative to x. 

Lemma 4 (Properties of the closure /). The closure f{x) has a number of properties that 
we will utilize frequently: 

- X < f{x) (inflationary); 

- f{f{x)) = f{x) (idempotent); 

- f{f{x)) = f{x) (fixpoint); 

- Kf{x)) = f{x) ifx<f{x) 
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Proof. The first three properties follow directly from the definition of /. The last one can be 
seen as follows: Denote u = f{x) and v = f{f(x)). Then we have by monotonicity 

X < fix) h fix) < fifix)) h u<v. 
On the other hand we have 

X < u \- fix) < fiu) = u. 
Since v is the least value with v = /(u) A/(a;) < v, we have v < u. (End of proof ) 

Theorem 1 (Tarski). Let L he a complete lattice and f : L ^ L a monotone function on L. 
Then f has a complete lattice of fixed points. In particular the least fixed point is the meet of all 
its pre-fixed points and the greatest fixed point is the join of all its post-fixed points. 

Theorem 2 (Kleene). For a continuous function f the least fixed point x is obtained as the 
least upper bound of the Kleene chain: 

x = U{±,/(X),/2(X),/3(±),...} 
where _L is the bottom element of the lattice. 

In the meanwhile it has been shown that the essence of these theorems also holds in the simpler 
structure of complete partial orders (cpos)^ . 

More recently Cai and Paige [6] published a number of generalizations that are streamlined 
towards practical algorithmic implementations of fixpoint computations. We paraphrase their 
main result here, since we are going to utilize it as a "blueprint" for our subsequent development. 

Theorem 3 (Cai- Paige). Let A he a cpo and f: A ^ A he a monotone function that is infla- 
tionary in r. Let moreover {sq, si, S2, . . . , Sn} he an arbitrary sequence obeying the conditions 

r = So 

Si < Sj+i < fisi) for i < n 

Sn ~ f i^n) 

Then Sn is the least fixed point off relative to r. Conversely, when the least fixed point is finitely 
computable, then the sequence will lead to such an Sn- 

Theorem 3 provides a natural abstraction from workset-based iterative algorithms, which main- 
tain a workset of change items. At each iteration, a change item is selected and used to generate 
the next clement of the iteration sequence. The incremental changes tend to be small and local- 
ized, hence this is called the micro-step approach and the Kleene chain the macro-step approach 
[23]. All practical collectors use a workset that records nodes that await marking. 



^ A cpo is a partial order in wliich every directed subset has a supremum 
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To illustrate these basic results, we derive the overall structure of a stop-the-world collector. 
The essence of it is the iterative algorithm for finding garbage nodes to recycle. 

Letting roots denote the roots of the active graph together with the head of the supply list, we 
have 

live = /{roots) 

where 

f{R) = {b\be G.sucs{a) h a G i?}; 

in words, the active nodes are the closure of the roots under the successor function in the current 
graph G. 

To derive an algorithm for computing the dead nodes, we calculate as follows: 

dead 

= C live definition 
= C /{roots) definition 
= g{roots) using the law C h{R) = i{R) where i{x) = Zh{C x) 

where g{R) is the greatest fixpoint of the monotone function 

g{x) = nodes \ {roots U {b \ b € sucs{a) &; a G nodes \ x}). 

This allows us to produce a correct, but naive iterative algorithm to compute dead nodes which 
is based on Theorem 2. 



Program 1 Raw Fixpoint Iteration Program 



1 W -v- h.nodes; 

2 while W # g{W) do -i- g{W) 

3 return W 



Following Cai and Paige [6], we can construct an efficient fixpoint iteration algorithm using a 
workset defined by 

WS = X\g{X). 

Although this workset definition is created by instantiating a problem-independent scheme, it 
has an intuitive meaning: the workset is the set of nodes whose parents have been "marked" as 
live, but who themselves have not yet been marked. The workset expression can be simplified 

as follows 

x\g{x) 

= { Definition } 

X \ {nodes \ {roots U {6 | 6 G sucs{a) & a G nodes \ X})) 
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= { Using the law ^ \ (B U C) = {A\B)\C } 

X \ {{nodes \ roots) \{b \ b E sucs{a) &; a G nodes \ X}) 
= { Using the law A\{B\C) = {A\B) \J (AnC) } 

{X \ {nodes \ roots)) [j {{b \ b G sucs{a) & a G nodes \ X} fl X) 
= { Using the law {a:|P(x)} nQ = {x|P(a:) Ax e Q} } 

{X \ {nodes \ roots)) \J {b \ b E sucs{a) & b e X & a G nodes \ X} 
= { Again using the law A\{B\C) = {A\B) [j {AnC) (on first term) } 

{X \ nodes) U {X Ci roots) U I ^ ^ sucs{a) & be X & a G nodes \ X} 
= { Simplifying } 

{} U (X n roots) \J{b\be sucs{a) k beX k ae nodes \ X} 
= { Simplifying} 

{X n roots) \J {b\be sucs{a) & be X k a e nodes \ X}. 

The greatest fixpoint expression can be computed by the workset-based Program 2, which is 
based on Theorem 3. 

Program 2 Workset-based Fixpoint Iteration Program 

1 W <^ nodes; 

2 while 3z € {{W n roots) [J{b\b€ sucs{a) & 6 e & a e nodes \ W}}) 

3 W ^W-z 

4 return W 



To improve the performance of this algorithm, wc apply the Finite Differencing transformation 
[21], according to which we incrementally maintain the invariant 

WS = {Wr\ roots) [j{b\be sucs{a) k beW k ae nodes \ W}. 

There are two places in the code that might disrupt the invariant, in lines 1 and 3. We maintain 
the invariant in line 1 with respect the initialization W = nodes as follows: 

Assume: W = nodes 

Simplify: WS = {W roots) [j {b \ b e sucs{a) k b eW k a e nodes \ W} 
{WS = {roots n nodes) U I ^ ^ sucs{a) k b e nodes k a e nodes \ nodes} 
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= roots U I ^ ^ sucs{a) &; 6 G nodes & a € {}} 
= roots U {} 
= roots. 

and incremental code (wrt the change W' = W — z): 

Assume: WS = {W n roots) \J {b\be sucs{a) k h eW & a G nodes \ W} 

& W' = W-z 

Simplify: W S' = {W n roots) [Mh\he sucs{a) k beW k a € nodes \ W'} 
W n roots) \J{b\be sucs{a) k beW k ae nodes \ W'} 
= { Using assumption W' = W — z } 

{{W -z)n roots) \J{b\be sucs{a) k b e {W - z) k aG nodes \{W - z)} 
= { Simplifying } 

{{W n roots) -z) \J{{b\be sucs{a) k beW k ae nodes \{W - z)} - z) 
= { Pulling out common subtraction of a } 

(W n roots) \J {b\be sucs{a) k beW k a G nodes \{W - z)} - z 
= { distribute element membership } 

(W n roots) \J{b\be sucs{a) k beW k {ae {nodes \W) V a = z)} - z 
= { distribute set-former over disjunction } 

{W n roots) 

\J{{b\be sucs{a) k b€W k a£ {nodes \ W)} 
\J{b\be sucs{a) k beW k a = z)}) - z 

= { fold definition of WS, and simplify} 

WS \J{b\be sucs{z) k be W}) - z. 

The resulting code is shown in Program 3. 

Programs 2 and 3 represent the abstract structure of most marking algorithms. Our point is that 
its derivation, and further steps toward implementation, are carried out by generic, problem- 
independent transformations, supported by domain-specific simplifications, as above. 

Further progress toward a detailed implementation requires a variety of other transformations, 
such as finite differencing, simplification, and datatype refinements. For example, the finite set 
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Program 3 Optimized Fixpoint Iteration Program 

invariant WS = {W n roots) U I ^ h.sucs{a) & heW k, o e h.nodes \ W} 1 

W := h.nodes \\ WS := roots; 2 

while 3z G WS' do 3 

W ■.= W-z \\WS := WS \J{b\be h.sucs{z) & 6 e W}) -z 4 

output W. 5 



W may be implemented by a characteristic function, which in turn is refined to a bit array, or 
concurrent data structures for local buffers or work-stealing queues. 

As our final preparatory step within the realm of the classical fixed-point concepts we mention 
a central property that is the core of the correctness proof for implementations. If we compute 
the sequence sq, si, S2, ■ ■ ■ ■, by a loop, then a Hoare-style verification would need the following 
invariant. 

Corollary 1 (Invariance of closure) . The elements of the set {sq < s\ < S2 < ■ ■ ■ < Sn} all 

have the same closure: 

f{si) = /(r) 

Proof. This invariance follows directly from monotonicity and the properties of / stated in 
Lemma 4: /(sj) < /(sj+i) < /(/(si)) = /(sj). (End of proof ) 

3.2 Fixed Points in Dynamic Settings (Concurrent Collectors) 

The classical fixed-point considerations work with a fixed monotone function /. In the garbage 
collection application this is justified as long as the graph, on which the collector works, remains 
fixed during the collector's activities. But as soon as the mutator keeps working in parallel with 

the collector, the graph keeps changing, while the collector is active. This can be modeled by 
considering a sequence of graphs Gq, Gi, G2, ■ ■ ■ and by making the function / dependent on 
these graphs: f{Go){. . •) . .), f{G2){. ..),..., where /: Graph Set{Node) Set{Node) 

and 

f{G){S) = S [j{b\aeS & be G.sucs{a)}. 

Intuitively, / extends a given set of nodes with the set of their successors in the graph. To ease 
readability we omit the explicit reference to the graphs and simply write /o, /i, /2, • • • . 

The foundation Using this notational liberty the specification of the underlying foundation 
is stated in Figure 10: the f are monotone CD and inflationary in r CE). Moreover the closure- 
forming operator / is defined by CE). 

The initial problem formulation Based on this foundation we can now formulate our 
goal. Recall the specification of the garbage collection task given by Collector in Figure 5: 
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SPEC Foundation 


EXTEND Cpo{A) 


// A is a cpo (alternatively: lattice) 




// sequence of functions 


Z: A ^ A ^ A ^ A 


// f is reflexive-transitive closure of f 


r: A 


// "root" 


X < y ^ fi{x) <fi{y) 


O // all fi are monotone 


r<fi{r) 


CD // all fi are inflationary in r 


f{x) = LEAST s: X < s A s = f{s) 


CD // closure (computes least fixed point) 



Fig. 10. Initial Specification 



live C trace{G) C nodes. This translates into our dynamic setting as livCn C s c nodes. We add 
as a working hypothesis that the set Uvea serves as an upper bound that we will need to guarantee 
in our dynamic algorithm: livCn C s C Hveo C nodes. 

The set Uvcq is sometimes called the "snapshot-at-the-beginning" [1]. Since in our abstract 
setting liven corresponds to the closure fn{f) and Uvcq corresponds to the closure fo{r), we 
immediately obtain the abstract formulation CD of our problem statement (Figure 11). 



SPEC Fixpoint-Problem 


EXTEND Foundation 






r < x ^ f\+i{U{x)) < 


Ux) 


CD // garbage can only grow 


THM 3 n, s: fn{r) < s 


< Ur) 


CD // livCn < s < Hveo 


THM r < X ^ fo{x) > 


Mx)>f2{x)>... 


CD // Lemma 5 



Fig. 11. Initial Specification 



Axiom CD is the abstract counterpart of the fundamental Proposition 1 in Section 2.2: the set 
of live nodes is monotonically decreasing over time, or, dually, garbage increases monotonically. 
For proof-technical reasons we have to conditionalize this property to any set x containing the 
roots r.) 

Note that the existential formula CD is trivially provable by setting n = and s = /o(r). Actually 
the property CD (see Lemma 5 below) shows that such an s exists for any n. However, our actual 
task will be to come up with a constructive algorithm that yields such an n and s.^ 

For the specification FixpointProblem we can prove the property CD (i-e. Lemma 5) that will 
be iMxxlcd later on. This monotonic decreasing of the closure is in accordance with our intuitive 

^ It may be interesting to compare property CD to the classical non-concurrent situation expressed in Lemma I. 
Instantiated to the final value s = Sn Lemma 1 states s = f{r). This equality can be formally rewritten into 
the two inequalities f{r) < s < f{r). This is weakened in property CD by setting on the left f = fn and on 
the right / = /o . Similar kinds of weakenings are also found in Hoare- or Dijkstrarstyle program developments, 
when deriving invariants from given pre- or post-conditions. 
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perception of the Mutator's activities. The operation delArc may lead to fewer live. And the 
operations addArc and addNew do not change the set of live nodes (since the freelist is part of 
the live nodes). 

Lemma 5 (Antitonicity of closure). The closures are monotonically decreasing: 
For r < X we have fo{x) > fi{x) > f2{x) > ■ ■ ■ ® 

Proof: We use a more general formulation of this lemma: For monotone g and h we have the 
property 

Vx: g{h{x)) < h{x) ^ g{x) < h{x) 

We show by induction that Vi: ^'(a;) < h{x). Initially we have g^{x) = x < h{x) due to the 
general reflexivity property CD of the closure. The induction step uses the induction hypothesis 
and then the premise: g'^^^{x) = g{g^{x)) < g{h{x)) < h{x). 

By instantiating /j+i for g and fi for h we immediately obtain /j_|.i(a;) < fi{x) by using the 
axiom CD, when r <x. (End of proof) 



3.3 The Micro step Refinement 

In order to get closer to constructive solutions we perform our first essential refinement. Gener- 
alizing the idea of Cai and Paige (sec Theorem 3) we add further properties to our specification, 
resulting in the new specification of Figure 12. Note that we now use some member s„ of the 
sequence sq, si, S2, ■ ■ ■ as a witness for the existentially quantified s. 



SPEC Micro-Stop 


EXTEND FixpointProblem 






So, Si, S2, ...-.A 




// sequence of approximations 


so = r 




// start with "root" 


Si < Si+l < fi{Si) y Si= fi{Si) 




// computation step 


THM 3n: /„(0 < Sn < fo{r)^ 


CD 


// to be shown below 


THM /o(so) > /l(si) > . . . > fn{Sn) 


C2) 


// Lemma 6 below 



Fig. 12. The "micro-step approach" 



Proof of property CD '■ In a finite lattice the Si cannot grow forever. Therefore there must be a 
fixpoint Sn = fn{sn) due to axiom d). Then the left half of the proof of CD follows trivially from 
monotonicity: 

Vi: r < Sj // axiom CD and CD 

I- r^<Sn=fnisn) ^ // s„ is fixpoint ^ 

I- fn{r) < fn{fn{sn)) = fn{sn) = Sn //properties of fn Lemma 4 
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The right half s„ < /o(r) is a direct consequence of the following Lemma 6. (End of proof ) 

Lemma 6 (Decreasing Closures). As a variation of Lemma 5 we can show property (J^: the 
closures are decreasing, even when applied to the increasing Sj/ 

V^: 7i+i(sj+i) < 

Proof: On the basis of Lemma 5 (property CD in Figure 11) the proof follows directly from 
axiom CD by monotonicity: 

Si+i < fi{si) // axiom CD 

I" fi+i{si+i) < fi+iifi{si)) < fi{fi{si)) = fi{si) // monotonicity of fi+i; CD 

Note that CD is applicable here, since - due to CD - r < fi{si) holds. (End of proof) 
Lemma 6 may be depicted as follows: 




As can be seen here, the approximations sq, si, S2) • • • keep growing, while at the same time 
their closures fo{sQ), /i(si), /2(s2), • • • keep shrinking. 

Remark: This situation can also be rephrased as follows: We have a function F(/j, Sj) that is 
applied to the elements of two sequences. This function is antitone in the first argument and 
monotone in the second argument; we have to show that - under the constraints given in our 
specification - the function still is monotonically increasing. 

This essentially concludes the derivation that can reasonably be done on this highly abstract 
mathematical level of fixed points and lattices. However, in the literature one can find a variant 
of collectors, the development of which is best prepared on this level of abstraction as well. 

3.4 A Side Track: Snapshot Algorithms 

Consider again the specification Microstep in Figure 12, where the goal is described in axiom CD 
as 3n: fn{i") < < /o('")- Evidently computing the value s„ = fo{i") is an admissible solution. 

® This approach, which has been used by Yuasa [32] and was refined later by Azatchi et al. [1], 
is also referred to as snapshot- at-the-beginning. 

* Remember that the closure computes the hve nodes; axiom CE) therefore means liven C s„ C Uvea, which can 
be solved by Sn = liveo- In other words, we compute the nodes that were live, when the Collector started. 
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In order to follow this development path we refine the specification Microstep in Figure 12 to 
the specification Snapshot in Figure 13 by requiring the additional constraint CS) that needs to 
be respected by later implementations. Note that the new axiom O is the classical invariant 
that is also used in non-concurrent garbage collectors. 



SPEC Snapshot 


EXTEND MicroStep 
Vi: fo{si) = /o(r) 




// classical invariant livci = Hvcq 



Fig. 13. The "snapshot approach" 



How can this kind of computation be achieved in practice? This is demonstrated by Azatchi et 
al. [1] based on a technique that has been developed by some of the authors for a concurrent 
reference-counting collector [18]. Starting from the fictitious idea of making a virtual snapshot 
by cloning the complete original heap, it is then shown that this copying can be done lazily such 
that only those nodes arc actually copied that are critical. 

We only sketch this idea here abstractly in our framework: We introduce a structure clone i 
and an operator V such that {GiV clonci) ~ Go; i-e. x ^ clonci Gi.sucs{x) = Go.sucs{x) and 
X G clonci => clone i- sues {x) = Gq.sucs(x). The computation of the sequence Sq, si, S2, ■■■ is 
then done based on (GjVcZonej) such that we effectively always apply /q. 

It remains to determine the structure clone i. There are solutions of varying granularity, but the 
most reasonable choice appears to be the following: Whenever the Mutator executes one of the 
operations addArc{a, b), delArc(a, b) or addNew(a), it puts the old a including all its outgoing 
arcs into clone. (This amounts to making a copy of the heap cell.) In practice, the efficiency of 
this approach is considerably improved by observing that the cloning need only be done during 
a very short phase of the collection cycle. The most complex aspect of this approach is the 
computation of the pointers into the heap that come from the local fields (stack, registers) of 
the Mutator; it has to be ensured that during this phase no simultaneously changed pointers get 
lost. In [1,18] this is performed by a technique called "snooping": essentially all pointers that 
are changed during this phase are treated as roots. (We will come back to this in Section 5.) 

For the technical details of this approach we refer to [1]. Suffice it to say that almost all of 
our subsequent refinements - which we start from the specification MicroStep - could also 
descend from Snapshot. Technically, we could combine Snapshot with our various refinements 
of MicroStep by forming a pushout. 

Remark: By showing that such an effectively working implementation exists, we have implicitly 
shown that the specification Snapshot is consistent with the specification MicroStep; that is, the 
refinement is admissible. 
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4 Garbage Collection in Dynamic Graphs 



We now take specific properties of garbage collection into account - but still on the "semi- 
abstract" level of sets and graphs. 

First we note that our specification of garbage collection using sets and set inclusion is a trivial 

instance of the lattice-oriented specification in the previous section. Therefore all results carry 
over to the concrete problem. The morphism is essentially defined by the following correspon- 
dences: 

A ^ Set{Node) 

^=\^ ^ ^ 

lfi{s) '-^ f{Gi){s) = sU Gi.sucs{s) = sU (Jags Gi.sucs{a) _ 

r ^ Gq. roots 



- The basis now is a sequence of graphs Go, Gi, G2, • • • which are due to the activities of the 
Mutator. 

- The function f{Gi){s) = s U Uaes Gi.sucs{a) adds to the set s all its direct successors. (We 
will retain the shorthand notation /j = f{Gi) in the following.) 



SPEC Foundation 



1 

SPEC FixpointProblem 




SPEC MirrnSicp 




^Pi:C Sllap^ll()l 









SPEC Reachability 


1 SPEC Reachability 

1 




^1 




SPEC Workset 


1 SPEC Workset 

1 




^ 




SPEC Dirtyset 


1 SPEC Dirtyset 

1 




^ 



Fig. 14. Roadmap of refinements 



Figure 14 illustrates the road map through our essential refinements. The upper half shows the 
refinements that have been performed in the previous Section 3 on the abstract mathematical 
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level of lattices and fixed points. The lower half shows the refinements on the semi-abstract level 
of graphs and sets that will be presented in this section. Finally, the right side of the diagram 
points out that all further developments could also be combined (by way of pushouts) with the 
sidetrack of the snapshot approach of Section 3.4. 

Lemma 7 (Morphism abstract — t- concrete). Under the above morphism # all axioms 
of the abstract specifications Foundation, FixpointProblem and MicroStep hold for the more 
concrete specifications of graphs and sets (see Figure 14)- 

Proof: We show the three morphism properties ^1,^2,^3 in turn. 

The proof is trivial, since the monotonicity axiom CX) is a direct consequence of the definition 
of ^{fi). Axiom dD is just a definition. 

^2' To foster intuition, we first consider the special case x = r: the morphism translates: 



In order to prove this last property, i.e. Va G livci: Gi+i.sucs{a) C Uvci, we must consider all 
nodes a G livci and all (sequences of) actions that the Mutator can use to effect the transition 
Gi ^ Gj+i. We distinguish the two possibilities for a € livci: 

(1) a € Gi-freelist: Then there are two subcases (which base on the reasonable constraint that 
nodes in the freelist and newly created nodes do not have "wild" outgoing pointers): 

(la) a G Gi-^-i. freelist, then Gi-^-i.sucs{a) C Gi-^-i. freelist C Gi. freelist C Uvci 

(lb) a € Gi+i. active (caused by addNew), then Gi-\-i.sucs{a) = 0; now (2) applies 

(2) a G Gi.active: Then there are three subcases for b G Gj+i .sucs(a): 

{2a) (a ^ 6) G Gi.arcs h 6 G Gi.active C Hvci 

{2b) {a — >■ b) created by addArc{a, 6) h 6 G Gi.active C Hvci 

{2c) {a — >■ b) created by addNew{a) h 6 G G^. freelist C Uvci 

If we start this line of reasoning not from the roots r but from a superset a; ^ r, then we 
need to consider supersets U ~D livCi (where the hat shall indicate that these sets are closed 
under reachability) and prove \/a £ li: Gi^i.sucs{a) C Zj. Evidently the reasoning in (1) and (2) 
applies here as well. But now there is a third case: 

(3) a G Gi.dead. In this case there is no operation of the Mutator that could change the suc- 
cessors of a (since all operations require a G active). Hence Gj+i.sucs(a) = Gi.sucs{a). Due 
to the closure property we have a E li ^ Gi.sucs{a) C /j. The above equality then entails also 
Gi+i.sucs{a) C Zj. 



(D A /(G,+i)(/(G,)(r)) C/(G,)(r) 



4» 

(Ziuei U Uaeii,;ei Gi+i.sucs{a)) C liva 
Va G livci'. GiJ^i.sucs{a) C Uvci 



// f{Gi){r) = livei, def of <P{fi) 



// {Ai U ...U An) C B : Ai C B 
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The morphism (p translates the axioms CD and CD into 
Si C Sj UUaes. G,.sucs{a) 
This is trivially fulfilled such that the constraint on the choice of Sj+i is well-defined. 

(End of proof ) 

When considering the last specification Micro-Step in Figure 12 then we have basically shown 
that any sequence sq, si, S2, ■ ■ ■ that fulfills the constraints CD and CD solves our task. But we 
have not yet given a constructive algorithm for building such a sequence. In the next refinement 
steps <^4 and ^5 we will proceed further towards such a constructive implementation (actually 
to a whole collection of implementation variants) by adding more and more constraints to our 
specification. Each of these refinements constitutes a design decision that narrows down the set 
of remaining implementations. 

4.1 Worksets ( "Wavefront" ) 

As a first step towards more constructive descriptions we return to the standard idea of "work- 
sets" (sometimes also referred to as "wavefront"), which has already been illustrated Program 
2, and in the examples in Section 2.5. This refinement is given in Figure 15. 



SPEC Workset 


EXTEND MicroStep 






bo, bi, 62, ■■■■■A 




// completely treated ("black") 


wq, wi, W2, . . . : A 




//partially treated ("workset" or "gray") 


= (6j l+j Wi)^ 




// partitioning into black and gray 


fi{si) = biU ftim) 


® 


// additional constraint 


THM = ^ fn{Sn) = bn 




// termination condition 



Fig. 15. The workset approach 



The partitioning Sj = (6^ 1+) lyj) arises naturally from the definition of the workset, as in Program 

2. But the additional axiom ® is a major constraint! It essentially states that the closure fi{si) 
of the current approximation Sj shall be primarily dependent on the closure of the workset 
Wi. This reduces the design space of the remaining implementations considerably - but from a 
practical viewpoint this is no problem, since we only exclude inefficient solutions. 

The theorem CD stated in the specification provides a termination condition for the later imple- 
mentations that is far more efficient than our original termination criterion /„(«„) = s„. 

An important observation: It is easily seen that the subtle error situation illustrated in Figure 9 in 
Section 2.5 violates the axiom (32). Therefore any further refinement of the specification Workset 
cannot exhibit this error. In other words: If we derive all our implementations as offsprings of 
the specification Workset in Figure 15, then we are certain that the bug cannot occur! 
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A major problem: Unfortunately, just introducing sufficient constraints for excluding error sit- 
uations is not enough. Consider the situation of Figure 9 in Section 2.5. We have to ensure 
that the Mutator cannot perform the two operations addArc{A, E) and delArc{D, E) without 
somehow keeping the axiom CS) intact. This necessitates for the first time that the Mutator 
cooperates with the Collector, thus introducing constraints for the Mutator. (Even though these 
constraints may be hidden in the component Store, they do have an implicit influence on the 
Mutator's working.) 

As has already been pointed out in Section 2.5, there are three principal possibilities to resolve 
this problem: 

- One can stop the Mutator until the Collector has finished (Section 3.1). 

- One can put A or E into the workset, when addArc{A, E) is executed. 

- One can put E into the workset, when delArc{D, E) is executed. 

Each of these "solutions" keeps the axiom intact, but they have problems. Stopping the 
Mutator is unacceptable, since this destroys the very idea of having Mutator and Collector work 
concurrently. In both of the other cases the Mutator adds elements to the workset, while the 
Collector is taking them out of the workset. Naive implementations of this specification would 

not guarantee termination. 

In the following we will present a number of refinements for solving this problem. These refine- 
ments are the high-level formal counterparts of solutions that can be found in the literature and 
in realistic production systems for the JVM and .Net. 

4.2 "Dirty Nodes" 

One can alleviate the stop times for the Mutator by splitting the workset into two sets, one 
being the original workset of the Collector, the other assembling the critical nodes from the 
Mutator. This is shown in Figure 16. The new axiom (3) is similar to ® using the partitioning 
Wi = {gi 1+) di). 



SPEC Dirtyset 


EXTEND Workset 






90 , 91, 92, ■■■■■A 




//partially treated by Collector ("gray") 


do, di, d2, ...-.A 




// introduced by Mutator ("dirty") 






// partitioning into black, gray and dirty 


fiisi) = biUMg^)U%{d,) 




// closure condition 


THM 5„ = =^ fn{Sn) = bn U /„(4) 




// intermediate termination condition 



Fig. 16. Introducing "dirty" nodes 



This specification can be implemented by a Collector that successively treats the gray nodes 
in gi until this set becomes empty (which can be guaranteed). But - by contrast to the earlier 
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algorithms - this does not yet mean that all live nodes have been found. As the theorem (ID shows 
we still have to compute fi{di). But this additional calculation tends to be short in practice, and 
the Mutator can be stopped during its execution. Consequently, correctness has been retained 
and termination has been ensured. 

The Mutator now adds "critical" nodes to the "dirty" set di. In order to keep the set di as 
small as possible one does not add all potentially critical nodes to it: as follows from axiom O), 
black or gray nodes need not be put into di. And since di is a set, nodes need not be put into it 
repeatedly. Actually, when the Mutator executes addArc{a, b) with a ^ Si ("a is still before the 
wavefront" ) , then axiom (S) would allows us the choice of putting a into di or not (similarly for 
b. Commonly, a is simply added to di. 

4.3 Implementing the Step Si i-^ Sj-i-i 

So far all our specifications only impose the constraint CD (see MicroStep in Figure 12) on their 
implementations, that is: 

Si < Si+l < fi{Si) V Si= fi{Si) 

The actual computation of the step Sj i-^ Sj+i has to be implemented by some function step. For 
this function we can have different degrees of granularity: 

- In a coarse-grained implementation we pick some node x from the gray workset and add all 
its non-black successors to the workset. Then we color x black. 

This variant is simpler to implement and verify, but it entails a long atomic operation. The 
corresponding write barrier slows down the standard working of the Mutators. 

- In a fine-grained implementation we treat the individual pointer fields within the current 
(gray) node x one-by-one. In our abstract setting this means that we work with the individual 
arcs. 

This makes the write barrier shorter and thus increases concurrency, but the implementation 
and its correctness proof become more intricate. 

On our abstract level we treat this design choice by way of two different refinements. This is 
depicted in Figure 17 (where the shorthand notation ...using x with p(x) entails that the 
property only has to hold when such an x exists). 

A note of caution. If we apply the morphism # introduced at the beginning of Section 4 directly, 
the strict inclusion Sj < Sj+i of axiom ® would not be provable. Therefore we must interpret 

{b,g) <{b',g') ^ bCb'V{b = b'AgCg'). 

But there are still further implementation decisions to be made. Both CoarseStep and FineStep 
specify (at least partly) how the step operation deals with the selected gray node. But this still 
leaves one important design decision open: How are the gray nodes selected? In the literature 
we find several approaches to this task: 

1. Iterated scanning. One may proceed as in the original paper by Dijkstra et al. [8] and re- 
peatedly scan the heap, while applying step to all gray nodes that are encountered. This has 
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SPEC DirtySet 



^1 



$2 



SPEC CoarseStep 



step: Set{Node) x Set(Node) 

Set{Node) x Set{Node) 

step{b, g) = {b ® x, 

{g U sucs{x)) \ (6 © a;)) 
USING X WITH 
X e g\h 



SPEC FineStep 



step: Set{Node) x Set{Node) 

Set{Node) x Set{Node) 
step{b,g) = {b, g® y) 

USING X, ?/ WITH 

X e g 

{x y) E Arcs 
y^ibUg) 
step{b, g) = {b® X, g Q x) 
USING X WITH 
X E g A sucs{x) H {bU g) 



Fig. 17. Step functions of different granularities 



the advantage of not needing any additional space, but it may lead to many scans over the 
whole heap, in the worst case 0{N'^) times, and is not considered practical. 

2. Alternatively one performs the classical recursive graph traversal, which may equivalently 
be realized by an iteration with a workset managed as a stack. This allows all the well- 
known variations, ranging from a stack for depth-first traversal to a queue for breadth-first 
traversal. In any case the time cost is in the order 0{\live\), since only the live nodes need 
to be scanned. However, there also is a worst-case need for 0(\live\) space - and space is a 
scarce resource in the context of garbage collection. 

3. One may compromise between the two extremes and approximate the workset by a data 
structure of bounded size (called a cache in [9,10]). When this cache overflows one has to 
sacrifice further scan rounds. 

4. When there are multiple mutators for efficiency it is necessary to have local worksets working 
concurrently. 

These design choices are illustrated in Figure 18. (But we refrain from coding all the technical 

details.) 

It should be emphasized that the refinements U^i, •J'2, ^3, of Figure 18 are independent of 
the refinements ^1, of Figure 17. This means that we can combine them in any way we like. 
The combination of some #i with some is formally achieved by a pushout construction as 
already mentioned in Section 1.2. In a system like Specware [16] such pushouts are performed 
automatically. 

It should be noted that axiom (ID in Figure 16 requires at least one scan in order to perform the 
cleanup fnidn) of the dirty nodes after the main marking phase is completed. 
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SPEC DirtySet 



SPEC IteratedScan 



7 ^ 

M . , ^ 



'^4 



SPEC Recursion 



SPEC BoundedCacho spec LocalWorkSets 



Fig. 18. Design choices for finding the gray nodes 



The Collector may also compute (parts of) fi{di) at any given point in time concurrently with 
the Mutators. This may make the finally remaining dirty set dn smaller and thus speed up 
the cleanup operation, which shortens the necessary pauses for the Mutators. These interim 
computations are harmless as long as the termination of the Collector's main marking phase 
does not depend on the set c?j becoming empty. 



4.4 Heap Partitioning (Generations, Cards, Pages etc.) 

The necessary scanning of the "dirty nodes" described in Section 4.3 above motivates a refine- 
ment that actually has a whole variety of different applications. In other words, the following 
abstract refinement is the parent of a number of further refinements that aim at solving different 
kinds of problems. 

In the following we briefly sketch, how such alternative refinements fit into our abstract frame- 
work. In Sections 6.3 and 6.4 we will discuss some concrete applications of this paradigm, in 
particular 

- generational garbage collectors; 

- dirty cards; 

- dirty pages. 

We integrate these techniques into our approach by means of superimposing a further structuring 
on the graph. This has already been hinted at in Figures 6-9 in Section 2.5, where the planes 
are further partitioned into areas. 

Definition 1 (Graph partitioning). A partitioning of the graph is given by splitting the set 
of nodes into disjoint subsets: 

Nodes = A^i l±) . . . tt) ATfc 

The subsets are called cards (following the terminology of e.g. [24])- 

These cards can be used to optimize the computation of the dirty nodes without compromising 
the correctness of the algorithm. To this end we need to introduce the constraint shown in the 
specification DirtyCards in Figure 19. 
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SPEC DirtyCards 



EXTEND Dirtyset 



TYPE Card = Set {Node) 
Ni, Nj,: Card 



// cards 

// cards partition the node set 

// "dirty" property of cards 
// constraining dirty nodes 



Nodes = iVi l+l . . . W iVfc 

dirty: Card — > Bool 



diC[j{Nf diHy{Nj) } 



eg) 



Fig. 19. Introducing "dirty" cards 



The axiom establishes the constraint that the dirty nodes can only lie on dirty cards. (This 
constraint has to be obeyed by the Mutator.) 

The axiom (is) in Figure 16 requires the cleanup computation of fn{dn) after the main marking 
phase has been completed. For this cleanup the new axiom (2) entails a considerable speedup, 
since only a subset of the cards Nj need to be scanned. 

5 Dynamic Root Sets 

In the previous sections we have performed the transition from classical fixed points in static 
graphs (relating to stop-the-world collectors) to dynamic fixed points in changing graphs (relat- 
ing to concurrent collectors) . However, we still utilize the inherent assumption that the Collector 
starts from a fixed root set r. Alas, this assumption - which is also contained in the original 
papers by Dijkstra, Steele and others [8, 28] - can not be maintained in practice due to the 
following reasons (see e.g. [9, 10, 1]): 

1. The local data of the Mutator (registers and stack) are indeed local; that is, the Collector 

has no access to them! 

2. The synchronization between the Collector and the Mutators requires write barriers. The 
corresponding overhead may be tolerable for heap accesses, but it is certainly out of the 
question for local stack or register operations. 

As a consequence of the first observation the Mutators have to participate actively in the garbage 
collection process, at least during the start phase of each collection cycle. And the second 
observation rules out certain solutions due to their unacceptable overhead. So the challenge 
is to maximize concurrency in the presence of these constraints. 

In the following we will show how these issues fit into our overall framework. 
5.1 Modeling Local Data 

In accordance with our earlier levels of abstraction we devise the following modeling for the 
Mutator-local data: all local data of a Mutator (registers and stack) are considered as one large 
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node, which we caU the pre-root pm- The potentiaUy quite large set sucs{pm) then represents 
all pointers out of the local stack and registers into the heap.^ 

Prom now on let us assume that there are q Mutators Mi, . . . ,Mq with pre-roots p\, ... , pq. 
The global variables are represented by the pre-root po; they are accessible by the Collector. 

Definition 2 (Pre-roots; local roots). Each Mutator possesses a pre-root p^. The successors 
of this pre-root are referred to as the Mutator's local roots; = sucs{pm)- 

Moreover, the global variables (which are accessible to the Collector) are represented by the 
pre-root po and the corresponding roots by ro = sucs{pq). 

The set of all pre-roots is denoted as p = {po, pi, . . . , pg}. The set of all roots is defined as 
r = sucs{p) = ro U UmeM^t "^m- 

The set r introduced in Definition 2 above essentially plays the role of the start value sq in the 

specification MicroStep of Figure 12. Hence, wc might rephrase the central property CD of this 
specification (after applying the morphism # from Section 4): 

3n: fnir) C s„ C /o(r) WHERE r = sucs{p) = sucs{po) U UtogMu* sucs{pm)- 

However, due to the subtle difficulties that arc caused by the concurrent activities of Collector 
and Mutators we should retreat to a more fundamental rephrasing of our overall task. 

5.2 Computation of the Roots 

The abstract modeling introduced in Definition 2 at the beginning of this section allows us to 
retain all the other modeling aspects of the preceding sections. In particular the concept of 
varying graphs Gq, Gi, G2, • • • and the thus induced functions /o, /i, /2, ■ ■ ■ can be applied to 
the computation of the root set r without changes. 

However, for reasons that will become clear in a moment, we need to make one change. Since the 
local registers and stacks of the Mutators are not part of the heap, they must not undergo the 
mark and sweep or the copying process. In our mathematical modeling we therefore no longer 
consider the refiexive-transitive closure / but only the non-reflexive transitive closure, which 
wc denote as /. As a consequence, the function / should not be inflationary cither. Hence, the 
morphism <P at the beginning of Section 4 now sets /i(s) = Uaes Gi.sucs{a). Consequently, the 
axiom ® in the specification MicroStep has to be changed to 

Si < Si+i < Si Ufi{si) V fi{si) < Si CD 

With these changes (for which all previous proofs work unchanged except for minor notational 
adaptations) we can now reformulate the garbage collection task slightly diflFerently (see Fig- 
ure 20, where the numbers are the same as in the original specifications). 

A major difference between this specification MicroStep' and the original specification MicroStep 
is the omission of the axiom CD, which determines the start value sq. This start value is no 

® Here it pays that we model pointers more abstractly as a (rrmlti)set of arcs. This is much more concise and 
elegant than speaking about "objects" and their "slots" for pointers, as it is usually done in the literature. 
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SPEC Micro-Step' 


f{x) = LEAST s: f{x) < s As = f{s) 




// transitive closure 


So, Si, S2, ...-.A 




// sequence of approximations 


Si < Si+l < Si Ufi{Si) V fi{Si) < Si 


® 


// computation step 


THM 3n: fn{p) <Sn< h{p) 




// liven ^ s„ C liveo 



Fig. 20. The "micro-step approach" 



longer a constant, but now has to be computed from the pre-roots. This computation is shghtly 
intricate, since the graph is undergoing continuous changes. To be more precise, we have the 
following scenario: 

- There are q mutators Mi, . . . , Mq. 

- When Mutator Mj computes its local roots r, from its pre-root , then the graph is in some 
stage Gj. 

- During the root computation the mutator stops its other activities. And the other muta- 
tors cannot access the mutator's local data. Hence the outgoing arcs from the local data 
into the heap remain unchanged throughout the local root computation. Hence we obtain 
n =fi{pi) = Gj.sucs{pi)}^ 

Based on these observations, the set sq = r that is computed by the mutators essentially is 

?^ = /o (Po) U /i (pi ) . . . U /g {pq ) II too naive 
However, this naive approach doesn't work all the time as we will show in the following. 

A problem.. Looking at our central correctness property (£) we would wish that the equality 
f{p) = holds. Alas, this is not necessarily the case. To see this, consider the example of 
Figure 21 (adapted from [9]). 




Fig. 21. A potential error 



Note that this does not mean that the graph remains invariant throughout the computation of the local root 
set n- On the contrary, the other mutators will usually change the graph continuously. But these changes do 
not affect the specific set Gj .sucs{pi). 
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Suppose Ml has computed its (only) local root a, i.e. ri = {a} (left side of Figure 21). Then 
it resumes its normal activities, which happen to load a pointer to b into a local variable or 
register; this is modeled as addArc{pi, b) on the right side of Figure 21. When the Collector now 
starts its recycling activities, we have b ^ ri, even though b G sucs{pi). So the Collector starts 
from a wrong root set. 

It is easily seen that this can indeed be disastrous: suppose that before the start of the Collector 
some Mutator M2 (or Mi itself) deletes the pointer from a to 6 (right side of Figure 21). Then 
b will indeed be considered garbage, even though it is still reachable from Mi. 

This is the same problem as the one that we had already encountered in Figure 9 of Section 2.5. 
However, there is a difference: the problematic pointers now are not caused by heap operations 
but by operations on the local variables and registers. For reasons of efficiency we do not want 
to slow down these local operations by wrapping them into read or write barriers. This would 
be particularly unpleasant, since the protection is only needed during an extremely short phase 
(namely the root marking), while the overhead would be hindering permanently. 

In [9] further problems are illustrated that could occur, when the Mutator is e.g. interrupted 
between the addArc{pi, b) and the delArc{a, b) operation for such a long time that the Collector 
performs a whole collection cycle. 

Towards a solution. We enforce the invariant 

fiip) c Ms,). 

Then when the computation terminates with the fixpoint s„ = fn{sn), the fixpoint properties 
immediately entail 

fn{p) Q Sn- 

This can be rephrased as 
liven C Sn. 

which is the main correctness criterion for the collector (as was stated in property C£) of Fig- 
ure 12). 

When looking at the critical situation of Figure 21, we can see that there are three perceivable 
solutions for establishing this requirement: 

1. Wc can stop all other activities of the mutators during the local-root computation. 

This is very safe, but it clearly has the disadvantage of causing long pauses. In [9] it is shown 
how this solution can be achieved using three handshakes that ensure that the Collector and 
all Mutators are in sync. 

2. Wc could request that the operation addArc{pi, b) puts b into the set Sj. 

This is a correct solution, but it has the disadvantage that we need a read barrier for loading 
heap pointers into local variables or registers. Even though this read barrier will be a single 
if-test during most of the time, it does add overhead. 

In some systems there is not even enough information to distinguish pointers from other values such that an 
abundance of operations would have to be engulfed into this protective overhead of barriers. 
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3. We could request that the operation delArc{a, b) puts b into the set Sj. 

This is a correct solution, which needs a write barrier on heap operations. This is not so 
bad, since this write barrier already exists for handling the other problems encountered in 
the previous sections. Still, there arc disadvantages: marking nodes at the very moment of 
their deletion will create floating garbage in the majority of cases. Moreover, the operation 
delArc{a,b) has to touch the node a, since it changes one of its slots; but the marking 
additionally has to touch another node, namely b. This adds to the overhead. 

Wc should mention that a fourth kind of solution is possible in the "snapshot-at-the-bcginning" 
approaches (see Section 3.4). This is demonstrated in the so-called "sliding- views" approach of 

[!]• 

4. The operation delArc{a, b) produces a clone of the node a. 

The advantage here is that less floating garbage is generated and that only the object a needs 
to be touched by the write barrier. But one has the overhead of the storing and managing 
the snapshot. 

6 Real- world Considerations 

It is well known that realistic garbage collectors - in particular concurrent or parallel ones 

- exhibit a huge amount of technical details that are ultimately responsible for the size and 
complexity of the verification efforts. The pertinent issues cover a wide range of questions such 
as: 

- What are the exact read and write barriers? 

- How do we treat the references in the global variables, the stacks and the registers? 

- Where do we put the marker bits (in mark-and-sweep collectors) or the forward pointers (in 
copying collectors)? 

Evidently we do not have the space here to address these questions in detail. But we should at 
least indicate the path towards their solution within our method. Therefore we list here some of 
the technical features contained in realistic product-level collectors and show how they fit into 
our framework. 

6.1 The Mutators' Capabilities 

In Section 2.1 we have limited the Mutators' behavior to the three operations addArc, delArc 
and addNew. By contrast, Doligez ct al. [9, 10] use eight operations, some of which are modified 
in other approaches [11]. We may group their operations as follows: 

- move, load: local data transfers (stack, registers) and read access to heap cells; 

- reserve, create: obtain space from freelist; create cell in that space; 

- fill, update: write into a new/existing cell; 
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- cooperate, mark: synchronize with CoUector; mark local roots. 

This design breaks the usual allocate operation into the three separate operations reserve, create, 
fill. Moreover it treats fill differently from update, since a new object is guaranteed to be still un- 
known to other Mutators and therefore can be handled with less synchronization overhead. (But 
in the approach of [11] this distinction is abolished.) This diversity and granularity is important 
for concrete discussions about issues such as Mutator-local mini heaps and other implementation 
details. But for our modeling approach and its refinement and correctness considerations our 
three basic operations cover the essential aspects. Technically speaking, the eight operations of 
Doligcz ct al. can be obtained by refining our low-level models even further. 

Moreover, Doligez et al. [9, 10] and many of their successor papers use a notation like heap [x, i] 
to refer to the i-th slot in the object x in the heap. Our model achieves the same effect with a 
more mathematical attitude by considering individual arcs (a — >■ b). In other words, (a — >■ bi), 
. . . , (a ^ bk) model the slots of the object a. Our notation allows a more flexible treatment of 
the question of whether objects are uniform or can have varying numbers of successor slots. 

6.2 Availability of a Runtime System 

Section 2.1 introduces an architecture with a component Store that represents the memory 
management used in modern runtime systems such as .Net or JVM, based on old ideas from the 
realm of functional languages such as ML or Haskell. Such a component provides only indirect 
access to the heap such that it is relatively easy to integrate read or write barriers, handshakes 
and other organizational means. The vast majority of the newer papers therefore target these 
kinds of architectures. 

In "uncooperative languages" such as C or C"*"'' things arc more intricate and it is not surprising 
that only a few papers address their demands, e.g. [3]. The difficulties caused by such uncooper- 
ative languages are overcome by using the capabilities of the underlying operating system such 
as the page table to introduce concepts like dirty pages. Moreover, one has to deal with lots of 
floating garbage, since - due to the lack of typing information - many non-pointer values have 
to be treated conservatively as if they were pointers. Last but not least there are also more intri- 
cate synchronization issues between the Collector and the Mutators. Benchmarks indicate that 
the overhead in such uncooperative languages is considerably higher than that in cooperative 
languages. 

6.3 Generational Collectors 

In Section 4.4 we have shown that we can superimpose the graph with an additional structuring 
such that the set of nodes is partitioned into subsets: Nodes = Ni ^ . . . \±i N^. Such a partitioning 
is fully compatible with our correctness considerations and the pertinent refinements: 

Generational garbage collectors partition the nodes according to their "age", where the age 
usually reflects the number of collection cycles that the node has survived. 

Whereas in traditional garbage collectors the nodes are physically moved to another area, this 
is no longer possible in concurrent collectors due to the large overhead that the pointer tracking 
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and synchronization would require. Therefore one usually only defines the generations logically. 
A non-moving solution for concurrent collectors is presented by Domani et al. [12] basing on 
earlier work of Demers et al. [7]. 

Printezis and Detlefs [24] describe a concurrent generational collector that has been implemented 

as part of the SUN Research JVM. The different generations often use different collectors. For 
the young generation, where nodes tend to die fast, copying collectors arc the technique of choice, 
whereas in older generations a mark-and-swccp approach works better [24]. 

The most critical issue in generational collectors is the treatment of old-to-young references, 
since the whole point of generations is NOT to touch the elder generations in most collection 
cycles. To cope with these references (that are caused by the Mutators' activities) one usually 
employs the dirty-cards or dirty-pages techniques that we discuss in the section. 

6.4 "Dirty Areas" (Cards, Pages, . . . ) 

As has been pointed out in Section 4.4 the disturbances caused by the Mutators can be en- 
capsulated into a concept of "dirty areas", which allows a compromise between accuracy and 
efficiency. 

- Cards are often used to speed up the scanning by constraining it to memory areas that 
actually may need scanning. This is done by using a dirty hit for each card, on which a 
mutator has performed some update: when the dirty bit is set, the card needs to be scanned. 
Cards and dirty bits are used (possibly under a different name) in connection with genera- 
tional garbage collectors [24] but also to cope with problems caused by multiple mutators. 
They can be based on very efficient write barriers. For example, the SUN Research VM [24] 
uses the two-instruction write barrier proposed in [30] . 

— Pages taken from the virtual-page mechanism of the underlying OS together with dirty hits 
can be used for uncooperative languages like C and 0+"*", where no runtime system exists 
that nicely separates the application programs from the memory management [3]. However, 
as is reported in [14] , the use of cards is more efficient than the use of page-protection-based 
barriers. 

6.5 Privacy of Local Data; Local Heaps 

One of the primary goals of many concurrent-collector designs is to keep the stop times of the 
mutators to a minimum (in practice within an order of magnitude of 2ms; see e.g. [1]). Here the 
greatest barrier is the global synchronization, when the mutators derive the local roots from their 
pre-roots (local registers and stack). The longest time that a thread waits for garhage collection 
is the time for it to mark the objects directly reachable from its stack [11]. 

Based on the observation that the major percentage of heap dynamics comes from short-lived 
and small objects, the global synchronization can be made less frequent by providing every 
mutator with its own local "mini heap" [9,13]. Then the mutator only needs to interact with 
the global heap in order to acquire a new mini heap, when the old one is full, or in order to store 
large objects. Otherwise it uses a very simple allocation scheme in its local heap, e.g. using a 
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so-called "bumper-pointer" technique [13]. This design, which is used e.g. in the IBM JVM, also 
has the advantage of being cache-friendly [2, 4, 15]. 

6.6 Detailed Memory Management 

There is a plethora of little details to be considered in real-world garbage collectors, of which 
the following list gives but a small selection. 

- Object size. The original garbage collectors by McCarthy [19] or Dijkstra [8] are based on the 
assumption of fixed-size heap cells. But in reality these cells come in all sizes and internal 
layouts. There all kinds of solutions for this problems such as using freelists of different sizes, 
splitting large objects into smaller pieces, using Mutator-local heaps for small objects and 
so forth. The size information is either stored with the object or inferred from the object's 
class. And so forth. 

- Coalescing. The sweeping phase should try to combine consecutive free junks into one large 
free junk in order to alleviate the fragmentation problem. This is easy in non-concurrent 
collectors, but is more complex in concurrent collectors, since now the Collector and the 
Mutators compete for the same resource, namely for the removal of cells from the freelist 
[24]. 

- Marker and pointer management. All the algorithms use various kinds of markers - for 
example the colors black, gray, white or the dirty bits - and various pointers - for example the 
forward pointers in copying collectors or the clone pointers in the "sliding-views" approaches. 
A typical technique for handling such problems is e.g. described in [13], where the vtable 
pointer, which is the first word of every object, is overwritten for the forward pointer needed 
in the copying collector. These pointers can be distinguished, since they point into disjoint 
(and known) storage areas. 

- Sets, markers and bitmaps. Many markers actually represent the membership of the node in 
a certain set. This can either be implemented by a marker bit in each object or by a global 
set representation using a bitmap. 

- Coping with "no-information". Most garbage collection approaches nowadays address the 
JVM or DotNet, where all memory accesses of application programs are indirect, since they 
arc handled by a memory manager. Therefore all the garbage collection activity can be 
bundled in the runtime system. Old systems based on C or C~^^ do not exhibit this luxury. 
There one may at best use work-arounds such as employing the virtual-paging mechanism 
of the underlying OS by making pages "dirty", whenever they are written to [3]. Another 
problem here is that many integer values have to treated as if they were pointers, this way 
producing a lot of floating garbage. 

7 Conclusion 

We have shown how the main design concepts in contemporary concurrent collectors can be 
derived from a common formal specification. The algorithmic basis of the concurrent collectors 
required the development of some novel generalizations of classical fixpoint iteration theory. We 
hope to find a wide variety of applications for the generalized theory, as there has been for the 
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classical theory. This is of interest since the reuse of abstract design knowledge across application 
domains is a key factor in the economics of formal derivation technology. Alternative refinements 
from the basic algorithm lead to a family tree of concurrent collectors, with shared ancestors 

corresponding to shared design knowledge. While our presentation style has been pedagogical, 
the next step is to develop the derivation tree in a formal derivation system, such as Specware. 

Acknowledgment. We are grateful to Erez Petrank and Chris Hawblitzel, with whom one of 
us (pp) enjoyed intensive discussions at Microsoft Research. Their profound knowledge on the 
challenges of practical real-world garbage collectors motivated us to push our original high-level 
and abstract treatment further towards concrete and detailed technical aspects - although we 
realize that we may still be on a very abstract level in the eyes of true practitioners. 
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